Skip to main content

All Questions

Tagged with
5votes
2answers
3kviews

SGDClassifier fit and partial_fit functions

I wanted to know what is the correct way to train the SGDClassier model on new data observations? Should I use the fit function or the ...
DPascal's user avatar
2votes
1answer
72views

Are there deduplication algorithms that do not work on a metric space?

Recently I got interested in the process of data cleansing and specifically in record linkage. Thus far I read about deterministic and probabilistic approaches to deduplicate data sets and to some ...
Imago's user avatar
6votes
0answers
118views

Fixed-radius range search in non-Euclidean space

I'm trying to find an indexing data structure most suitable for my metric space: set of IP network related data (IP addresses, ports, TCP flags, ...), distance function is continuous, non-Euclidean ...
Jan Wrona's user avatar
0votes
2answers
109views

Data Science Companies [closed]

I'm interested in data science market. I was expecting that there would be a lot of companies who are making algorithms and models for companies like in kaggle competitions. But i struggle to find any....
tengo's user avatar
2votes
3answers
2kviews

Finding outliers in multiple dimensions

I'm working on a dataset which isn't normally distributed. The dataset contains three dimensions like cost, discount and profit. I'm trying to find possible outliers in all these dimensions. I used ...
tourist's user avatar
4votes
3answers
5kviews

How to explain decision tree algortihm in layman's terms?

I have a task at hand, where I have to explain decision tree algorithm to a person who has not much understanding of ...
user2966197's user avatar
2votes
1answer
266views

Time Complexity notation in Big Data platforms

I am redesigning some of the classical algorithms for Hadoop/MapReduce framework. I was wondering if there any established approach for denoting Big(O) kind of expressions to measure time complexity? ...
Mohitt's user avatar
2votes
1answer
2kviews

Optimizing Weka for large data sets

First of all, I hope I'm in the right StackExchange here. If not, apologies! I'm currently working with huge amounts of feature-value vectors. There are millions of these vectors (up to 20 million ...
lennyklb's user avatar
1vote
4answers
6kviews

Small project ideas for Machine Learning [closed]

I need some serious help. I am supposed to implement a project (Non-Existing as of now) for my Machine Learning course. I have no basics in AI or Data mining or Machine learning. I have been searching ...
Shamy's user avatar
10votes
2answers
2kviews

Scalable Outlier/Anomaly Detection

I am trying to setup a big data infrastructure using Hadoop, Hive, Elastic Search (amongst others), and I would like to run some algorithms over certain datasets. I would like the algorithms ...
doublebyte's user avatar

close